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PRELIMINARY AMENDMENT 



Sir: 



Prior to Examination, please amend the above-identified application as follows 
In the Specification: 

Page 1, at line 2, please insert the following paragraph: 
-BACKGROUND OF THE INVENTION-- 

Page 2, at line 30, please insert the following paragraph: 
-SUMMARY OF THE INVENTION— 

Page 5, delete lines 5-8. 

Page 5, at line 9, please insert the following paragraph: 
-BRIEF DESCRIPTION OF THE DRAWTNGS- 

Page 6, at line 17, insert the following paragraph: 
-DETAILED DESCRIPTION- 

In the Abstract: 

-The N samples of a frame of an audio signal are weighted by an analysis window of 
Hamming, Hanning, Kaiser or similar type A spectrum of the audio signal is calculated by 
transforming each frame of weighted samples in the frequency domain, and the spectrum of 
the audio signal is processed to deliver parameters for synthesizing a signal derived from the 
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analyzed audio signal. The successive frames comprise an alternation of frames for which are 
delivered complete sets of synthesis parameters and of frames for which are delivered 
incomplete sets of synthesis parameters. The successive frames for which complete sets of 
synthesis parameters are delivered exhibit mutual overlaps of less than N/2 samples.-- 

In the Claims: 

Amend the following claims: 

1. (Amended) A method of analyzing an audio signal processed by successive 
frames of N samples, N being an integer greater than 1, comprising the steps of: 

weighting the samples of each frame by an analysis window of Hamming, Harming, 
Kaiser or similar type; 

calculating a spectrum of the audio signal by transforming each frame of weighted 
samples in the frequency domain; and 

processing the spectrum of the audio signal to deliver synthesis parameters for a 
signal derived from the analyzed audio signal; 

wherein the successive frames comprise an alternation of frames for which complete 
sets of synthesis parameters are delivered and of frames for which incomplete sets of 
synthesis parameters are delivered, and wherein the successive frames for which complete 
sets of synthesis parameters are delivered exhibit mutual overlaps of less than N/2 samples. 

2. (Amended) The method as claimed in claim 1, wherein the incomplete sets of 
synthesis parameters include data representing an error of interpolation of at least one of the 
synthesis parameters. 

3. (Amended) The method as claimed in claim 1, wherein the incomplete sets of 
synthesis parameters include data representing a filter for interpolating at least one of the 
synthesis parameters. 

4. (Amended) The method as claimed in claim 1, wherein the processing of the 
spectrum of the audio signal comprises extracting coding parameters for transmitting and/or 
storing a coded audio signal. 

5. (Amended) The method as claimed in claim 1, wherein the processing of the 
spectrum of the audio signal comprises a denoising operation by spectral subtraction. 



6. (Amended) An audio processing device, for analyzing an audio signal by 
successive frames of N samples, N being an integer greater than 1, comprising: 

means for weighting the samples of each frame by an analysis window of Hamming, 
Hanning, Kaiser or similar type; 

means for calculating a spectrum of the audio signal by transforming each frame of 
weighted samples in the frequency domain; and 

means for processing the spectrum of the audio signal to deliver synthesis parameters 
for a signal derived from the analyzed audio signal; 

wherein the successive frames comprise an alternation of frames for which complete 
sets of synthesis parameters are delivered and of frames for which incomplete sets of 
synthesis parameters are delivered, and wherein the successive frames for which complete 
sets of synthesis parameters are delivered exhibit mutual overlaps of less than N/2 samples. 

7. (Amended) The device as claimed in claim 6, wherein the incomplete sets of 
synthesis parameters include data representing an error of interpolation of at least one of the 
synthesis parameters. 

8. (Amended) The device as claimed in claim 6, wherein the incomplete sets of 
synthesis parameters include data representing a filter for interpolating at least one of the 
synthesis parameters. 

9. (Amended) The device as claimed in claim 6, wherein the processing means 
comprise means for extracting coding parameters for transmitting and/or storing a coded 
audio signal. 

10. (Amended) The device as claimed in claim 6, wherein the processing means 
comprise spectral subtraction means for cancelling noise in the audio signal. 

11. (Amended) A method of synthesizing an audio signal, comprising the steps 

of: 

obtaining successive spectral estimates respectively corresponding to frames of N 
samples of the audio signal weighted by an analysis window, N being an integer greater than 

i; 
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evaluating each frame of the audio signal by transforming the spectral estimates in the 
time domain; 

modifying each evaluated frame by applying thereto a processing corresponding to a 
division by said analysis window and to a multiplication by a synthesis window; and 

synthesizing the audio signal as an overlap sum of the modified frames, 

wherein the successive frames exhibit mutual overlaps of L samples, L being an 
integer greater than 1 and smaller than N/2, 

and wherein the synthesis window fg(i) satisfies fg(N-L+i) + f§(i) = A for 0 < i < L, 
and fg(i) = A for L < i < N-L, A being a positive constant and i being a sample rank in a 
frame with 0 < i < N. 

12. (Amended) The method as claimed in claim 11, wherein the synthesis 
window fg(i) increases from 0 to A for i ranging from 0 to L. 

13. (Amended) The method as claimed in claim 12, wherein the synthesis 
window f s (i) for 0 < i < L is a raised half-sinusoid. 

14. (Amended) An audio processing device, comprising: 

means for obtaining successive spectral estimates respectively corresponding to 
frames of N samples of an audio signal weighted by an analysis window, N being an integer 
greater than 1; 

means for evaluating each frame of the audio signal by transforming the spectral 
estimates in the time domain; 

means for modifying each evaluated frame by applying thereto a processing 
corresponding to a division by said analysis window and to a multiplication by a synthesis 
window; and 

means for synthesizing the audio signal as an overlap sum of the modified frames, 
wherein the successive frames exhibit mutual overlaps of L samples, L being an 

integer greater than 1 and smaller than N/2, 

and wherein the synthesis window f s (i) satisfies f s (N-L+i) + f s (i) = A for 0 < i < L, 

and f§(i) = A for L < i < N-L, A being a positive constant and i being a sample rank in a 

frame with 0 < i < N. 
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15. (Amended) The device as claimed in claim 14, wherein the synthesis window 
fg(i) increases from 0 to A for i ranging from 0 to L. 

16. (Amended) The device as claimed in claim 15, wherein the synthesis window 
f§(i) for 0 < i < L is a raised half-sinusoid. 

Add the following claims: 

17. (New) A method of synthesizing an audio signal, comprising the steps of: 
defining a set of successive overlapping frames of N samples of the audio signal, N 

being an integer greater than 1 ; 

obtaining spectral estimates for a subset of the frames by processing synthesis 
parameters respectively associated with the frames of said subset; 

obtaining spectral estimates for the frames of the set which are not in said subset, with 
an interpolation of at least part of the synthesis parameters; 

evaluating the frames of the set weighted by an analysis window, by transforming in 
the time domain the spectral estimates respectively obtained for said frames; and 

modifying each evaluated frame by applying thereto a processing corresponding to a 
division by said analysis window and to a multiplication by a synthesis window; and 

synthesizing the audio signal as an overlap sum of the modified frames, 

wherein the successive frames of said subset exhibit mutual time shifts of M samples, 
M being an integer greater than N/2, while the successive frames of said set exhibit mutual 
time shifts of M/p samples, p being an integer larger than 1, 

and wherein, the samples of a frame having ranks i numbered from 0 to N-l, the 
synthesis window f g(i) has a support limited to the ranks i ranging from N/2-M/p to 
N/2+M/p and satisfies f s (i) + f s (i+M/p) = A for N/2-M/p < i < N/2, A being a positive 
constant. 

18. (New) The method as claimed in claim 17, wherein the synthesis window 
f g(i) increases for i ranging from N/2-M/p to N/2. 
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19. (New) The method as claimed in claim 18, wherein the synthesis window 
f g(i) is a raised sinusoid for N/2-M/p < i < N/2+M/p. 

20. (New) The method as claimed in claim 17, further comprising the steps of: 
associating data representing an interpolation error with the frames which are not in 

said subset; and 

correcting at least one of the interpolated synthesis parameters by means of said data. 

21. (New) The method as claimed in claim 17, further comprising the steps of: 
associating data representing an interpolator filter with the frames which are not in 

said subset; and 

interpolating at least one of the synthesis parameters by means of the interpolator 
filter represented by said data. 

22. (New) The method as claimed in claim 17, wherein the synthesis parameters 
comprise cepstral coefficients subjected to the interpolation. 

23. (New) An audio processing device, comprising: 

framing means for defining a set of successive overlapping frames of N samples of an 
audio signal, N being an integer greater than 1 ; 

means for obtaining spectral estimates for a subset of the frames by processing 
synthesis parameters respectively associated with the frames of said subset; 

means for obtaining spectral estimates for the frames of the set which are not in said 
subset, with an interpolation of at least part of the synthesis parameters; 

means for evaluating the frames of the set weighted by an analysis window, by 
transforming in the time domain the spectral estimates respectively obtained for said frames; 
and 

means for modifying each evaluated frame by applying thereto a processing 
corresponding to a division by said analysis window and to a multiplication by a synthesis 
window; and 

means for synthesizing the audio signal as an overlap sum of the modified frames, 
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wherein the successive frames of said subset exhibit mutual time shifts of M samples, 
M being an integer greater than N/2, while the successive frames of said set exhibit mutual 
time shifts of M/p samples, p being an integer larger than 1, 

and wherein, the samples of a frame having ranks i numbered from 0 to N-l, the 
synthesis window f §(i) has a support limited to the ranks i ranging from N/2-M/p to 

N/2+M/p and satisfies f s (i) + f s (i+M/p) = A for N/2-M/p < i < N/2, A being a positive 

constant. 

24. (New) The device as claimed in claim 23, wherein the synthesis window 
f g(i) increases for i ranging from N/2-M/p to N/2. 

25. (New) The device as claimed in claim 24, wherein the synthesis window 
f §(i) is a raised sinusoid for N/2-M/p < i < N/2+M/p. 

26. (New) The device as claimed in claim 23, further comprising: 

means for associating data representing an interpolation error with the frames which 
are not in said subset; and 

means for correcting at least one of the interpolated synthesis parameters by means of 
said data. 

27. (New) The device as claimed in claim 23, further comprising: 

means for associating data representing an interpolator filter with the frames which 
are not in said subset; and 

means for interpolating at least one of the synthesis parameters by means of the 
interpolator filter represented by said data. 

28. (New) The device as claimed in claim 23, wherein the synthesis parameters 
comprise cepstral coefficients subjected to the interpolation. 



Remarks: 

Allowance of all claims is respectfully requested. The Commissioner is authorized to 
charge any additional fees under 37 C.F.R. § 1.16 and § 1.17, or credit any overpayment to 
Deposit Account No. 20-1504 (MTR.0030US). 



if hi 



Respectfully submitted, 



Dan C. Hu, Registration No. 40,025 
TROP, PRUNER & HU, P.C. 
8554 Katy Freeway, Suite 100 
Houston, Texas 77024-1805 
(713) 468-8880 [Phone] 
(713) 468-8883 [Fax] 
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VERSIONS WITH MARKINGS TO SHOW CHANGES 

IN THE CLAIMS : 

New claims 17-28 have been added. Amendments of the claims are 
indicated below: 

1. (Amended) A method of analyzing an audio signal [(x)] processed 
by successive frames of N samples, N being an integer greater than 1, comprising 
the steps of: 

[in which] weighting the samples of each frame [are weighted] by an 
analysis window [(f A )] of Hamming, Hanning, Kaiser or similar type;[,] 

calculating a spectrum of the audio signal [is calculated] by transforming 
each frame of weighted samples in the frequency domain; and [,] 

processing the spectrum of the audio signal [is processed so as] to deliver 
synthesis parameters [(cx_sup, cx_inf,Emix)] for [synthesizing] a signal derived 
from the analyzed audio signal; [,] 

[characterized in that] wherein the successive frames comprise an 
alternation of frames for which [are delivered] complete sets of synthesis 
parameters are delivered and of frames for which [are delivered] incomplete sets of 
synthesis [parameters, and in that] parameters are delivered, and wherein the 
successive frames for which complete sets of synthesis parameters are delivered 
exhibit mutual overlaps of less than N/2 samples. 

2. (Amended) The method as claimed in claim 1, [in which] wherein 
the incomplete sets of synthesis parameters include data [(icx[n-l/2)] representing 
an error [(ecx[n-l/2])] of interpolation of at least one of the synthesis parameters. 

3. (Amended) The method as claimed in claim 1 , [in which] wherein 
the incomplete sets of synthesis parameters include data [(iP)] representing a filter 
[(128)] for interpolating at least one of the synthesis parameters. 

4. (Amended) The method as claimed in [in any one of claims 1 to 3, 
in which] claim 1, wherein the processing of the spectrum of the audio signal [(x)] 
comprises [an extraction of] extracting coding parameters [(cx_sup, cx_inf, Emix) 
with a view to the transmission and/or the store of the] for transmitting and/or 
storing a coded audio signal. 



5. (Amended) The method as claimed in [any one of claims 1 to 3, in 
which] claim 1, wherein the processing of the spectrum of the audio signal [(x)] 
comprises a denoising operation by spectral subtraction. 

6. (Amended) An audio processing device, [comprising analysis 
means for executing a method as claimed in claims 1 to 5] for analyzing an audio 
signal by successive frames of N samples, N being an integer greater than 1, 
comprising: 

means for weighting the samples of each frame by an analysis window of 
Hamming, Hanning, Kaiser or similar type; 

means for calculating a spectrum of the audio signal by transforming each 
frame of weighted samples in the frequency domain; and 

means for processing the spectrum of the audio signal to deliver synthesis 
parameters for a signal derived from the analyzed audio signal; 

wherein the successive frames comprise an alternation of frames for which 
complete sets of synthesis parameters are delivered and of frames for which 
incomplete sets of synthesis parameters are delivered, and wherein the successive 
frames for which complete sets of synthesis parameters are delivered exhibit 
mutual overlaps of less than N/2 samples. 

7. (Amended) [A method of synthesizing an audio signal, in which 
successive spectral estimates (Y) corresponding respectively to frames of N 
samples of the audio signal which are weighted by an analysis window (f A ) are 
obtained, the successive frames exhibiting mutual overlaps of L samples, each 
frame of the audio signal is evaluated by transforming the spectral estimates in the 
time domain, and the frames evaluated are combined to form the synthesized 
signal (x), characterized in that each evaluated frame is modified by applying 
thereto a processing corresponding to a division by said analysis window (f A ) and 
to a multiplication by a synthesis window (fs), and the synthesized signal is formed 
as an overlap sum of the modified frames, and in that, the number L being smaller 
than N/2 and the samples of a frame having ranks i numbered from 0 to N-l, the 
synthesis window fs(i) satisfies fs(N-L+i) + fs(i) = A for 0 < i < L, and is equal to 



A for L < i < N-L, A being a positive constantl The device as claimed in claim 6, 
wherein the incomplete sets of synthesis parameters include data representing an 
error of interpolation of at least one of the synthesis parameters . 

8. (Amended) [The method as claimed in claim 7, in which the 
synthesis window f s (i) increases from 0 to A for i going from 0 to L] The device as 
claimed in claim 6, wherein the incomplete sets of synthesis parameters include 
data representing a filter for interpolating at least one of the synthesis parameters . 

9. (Amended) [The method as claimed in claim 8, in which the 
synthesis window fs(i) for 0 < i < L is a raised half-sinusoid] The device as 
claimed in claim 6, wherein the processing means comprise means for extracting 
coding parameters for transmitting and/or storing a coded audio signal. 

10. (Amended) [A method of synthesizing an audio signal, in which a 
set of successive overlapping frames of N samples of the audio signal which are 
weighted by an analysis window (f\) is evaluated, by transforming in the time 
domain spectral estimates (Y) corresponding respectively to said frames, and the 
evaluated frames are combined to form the synthesized signal (x), characterized in 
that, for a subset of the evaluated frames, the spectral estimates are obtained by 
processing synthesis parameters (cx_sup_q, cx_inf_q, Emix) respectively 
associated with the frames of said subset while, for the frames which do not form 
part of the subset, the spectral estimates are obtained with an interpolation of a part 
at least of the synthesis parameters, in that the successive frames of said subset 
exhibit mutual time shifts of M samples, the number M being larger than N/2, 
while the successive frames of said set exhibit mutual time shifts of M/p samples, 
p being an integer larger than 1, in that each evaluated frame is modified by 
applying thereto a processing corresponding to a division by said analysis window 
(f A ) and to a multiplication by a synthesis window (f s ), and the synthesized signal 
is formed as an overlap sum of the modified frames, and in that, the samples of a 
frame having ranks i numbered from 0 to N-l, the synthesis window f s(i) has a 
support limited to the ranks i ranging from N/2 - M/p to N/2 + M/p and satisfies 
f s(i) + f s(i + M/p) = A for N/2 - M/p < i < N/2, A being a positive constant] The 



device as claimed in claim 6, wherein the processing means comprise spectral 
subtraction means for cancelling noise in the audio signal. 



11. (Amended) [The method as claimed in claim 10, in which] A 
method of synthesizing an audio signal comprising the steps of: 

obtaining successive spectral estimates respectively corresponding to 
frames of N samples of the audio signal weighted by an analysis window, N being 
an integer greater than 1: 

evaluating each frame of the audio signal by transforming the spectral 
estimates in the time domain; 

modifying each evaluated frame by applying thereto a processing 
corresponding to a division by said analysis window and to a multiplication by a 
synthesis window; and 

synthesizing the audio signal as an overlap sum of the modified frames. 

wherein the successive frames exhibit mutual overlaps of L samples, L 
being an integer greater than 1 and smaller than N/2. 

and wherein the synthesis window [f s (i) increases for i ranging from N/2- 
M/p to N/2] f jp satisfies f _ fN-L+n + f JD = A for 0 < i < L. and fJi) = A for 
L < i < N-L. A being a positive constant and i being a sample rank in a frame with 
0<i<N. 

12. (Amended) The method as claimed in claim 1 1, [in which] wherein 
the synthesis window [f S (I) for N/2-M/p<i<N/2+M/p is a raised sinusoid] f g (i) 
increases from 0 to A for i ranging from 0 to L . 

13. (Amended) The method as claimed in claim 12. wherein the 
synthesis window f g (i) for 0 < i < L is a raised half-sinusoid [The method as 
claimed in any one of claims 10 to 12, in which data (icx_q[n-l/2]) representing an 
interpolation error (ecx_q[n-l/2]) are associated with the frames which do not form 
part of said subset, and are used to correct at least one of the interpolated synthesis 
parameters (cx_i[n-l/2])]. 



14. (Amended) [The method as claimed in any one of claims 10 to 12, 
in which data (iP) representing an interpolator filter (128) are associated with the 
frames which do not form part of said subset, and are used to interpolate at least 
one of the synthesis parameters] An audio processing device, comprising: 

means for obtaining successive spectral estimates respectively 
corresponding to frames of N samples of an audio signal weighted by an analysis 
window, N being an integer greater than 1 ; 

means for evaluating each frame of the audio signal by transforming the 
spectral estimates in the time domain; 

means for modifying each evaluated frame by applying thereto a 
processing corresponding to a division by said analysis window and to a 
multiplication by a synthesis window; and 

means for synthesizing the audio signal as an overlap sum of the modified 

frames. 

wherein the successive frames exhibit mutual overlaps of L samples. L 
being an integer greater than 1 and smaller than N/2. 

and wherein the synthesis window f g fi) satisfies f g fN-L+i) + f _ (i) = A for 
0 < i < L. and f g (i) = A for L < i < N-L, A being a positive constant and i being a 
sample rank in a frame with 0 < i < N . 

15. (Amended) The device as claimed in claim 14. wherein the 
synthesis window r _ (T) increases from 0 to A for i ranging from 0 to L [The method 
as claimed in any one of claims 10 to 14, in which the synthesis parameters 
comprise cepstral coefficients (cx[n]) subjected to the interpolation]. 

16. (Amended) [An audio processing device, comprising synthesis 
means for executing a method as claimed in any one of claims 7 to 15] The device 
as claimed in claim 15. wherein the synthesis window f for 0 < i < L is a raised 
half-sinusoid . 
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METHODS AND DEVICES FOR AUDIO ANALYSIS AND SYNTHESIS 

The present invention relates to the analysis and 
synthesis of audio signals, on the basis of 
5 representations of these signals in spectral domain. 

It applies in particular, but not exclusively, to the 
coding of speech, in narrowband or in broadband, in 
various coding bit rate ranges. Among the other fields 
10 of application, mention may be made of denoising by 
spectral subtraction (see EP-A 0 534 837 or 

W099/14739) . 

In the methods of analysis in question, the spectrum of 
15 the signal is obtained by transforming successive 
frames to the frequency domain. The transformation 
employed is usually the fast Fourier transform (FFT) ; 
however other known transforms can be used. In the 
frequent case of a sampling of the signal at 8 kHz, the 
20 number N of samples per frame is typically of the order 
of 100 to 500, this representing frames of a few tens 
of milliseconds. To benefit from the maximum resolution 
in frequency, the FFT is performed on 2N points, N zero 
samples being added to the N samples of the frame. 



The spectrum obtained by Fourier transform of the 
signal frame is the convolution of the real spectrum of 
the signal by the Fourier transform of the signal 
analysis window. This analysis window, which weights 

30 the samples of each frame, is required so as to take 
account of the finite duration of the frame. If the 
signal frame is subjected to the FFT directly, that is 
to say if a rectangular analysis window is used, the 
spectrum obtained is disturbed by the secondary peaks 

35 of the FFT of the analysis window. To limit this 
drawback, which is especially noticeable when 
parameters representing the signal or the noise have to 
be extracted from the spectra, recourse is had to 
windows having better spectral properties, that is to 



25 
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say weighting functions whose support is limited to N 
samples and whose Fourier transform has its energy 
concentrated in a narrow peak with a strong attenuation 
of the secondary peaks. The most common of these 
5 windows are the Hamming, Hanning and Kaiser windows. 

In the analysis and synthesis procedure known as OLA 
("Overlap-And-Add") , the successive frames exhibit 
mutual overlaps of 50% (N/2 samples) . Since the 
10 analysis windows commonly used satisfy the property 
f A (i+N/2) + f A (i) = 1/ synthesis can be performed simply 
by overlap-summing the frames of N samples, which 
frames are calculated in succession by inverse Fourier 
transform of the spectra. 

15 

With the aim of refining the spectral representation, 
certain procedures referred to as WOLA ("Weighted OLA") 
use, for analysis, frames whose mutual overlaps are 
more than 50%. For the synthesis, it is necessary to 
20 reweight the samples of the frames before summing them. 
These procedures increase the complexity of the 
analysis and of the synthesis. In coding applications, 
they also increase the transmission bit rate required. 

25 An aim of the invention is to propose a scheme for 
analyzing and synthesizing audio signals which makes it 
possible to limit the rate of the analysis frames, 
while using analysis windows having good spectral 
properties . 

30 

The invention proposes a method of analyzing an audio 
signal processed by successive frames of N samples, in 
which the samples of each frame are weighted by an 
analysis window of Hamming, Hanning, Kaiser or similar 
35 type, a spectrum of the audio signal is calculated by 
transforming each frame of weighted samples in the 
frequency domain, and the spectrum of the audio signal 
is processed so as to deliver parameters for 
synthesizing a signal derived from the analyzed audio 
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signal. According to the invention, the successive 
frames comprise an alternation of frames for which are 
delivered complete sets of synthesis parameters, which 
exhibit mutual overlaps of less than N/2 samples, i.e. 
5 less than 50%, and of frames for which are delivered 
incomplete sets of synthesis parameters. 

The frames for which complete sets of synthesis 
parameters are not delivered may form the subject of no 

10 spectral analysis. As a variant, an analysis may 
nevertheless be performed for these frames, so as to 
deliver incomplete sets of synthesis parameters 
including data representing an error of interpolation 
of at least one of the synthesis parameters and/or data 

15 representing a filter for interpolating at least one of 
the synthesis parameters. 

In a first field of application of the method, the 
processing of the spectrum of the audio signal 

20 comprises an extraction of coding parameters with a 
view to the transmission and/or the storage of the 
coded audio signal. In a second field of application of 
the method, the processing of the spectrum of the audio 
signal comprises a denoising by spectral subtraction. 

25 Other fields of application may also be envisaged among 
audio processings. 

A second aspect of the invention relates to a method of 
synthesizing an audio signal, in which successive 

30 spectral estimates corresponding respectively to frames 
of N samples of the audio signal which are weighted by 
an analysis window are obtained, the successive frames 
exhibiting mutual overlaps of L samples, each frame of 
the audio signal is evaluated by transforming the 

35 spectral estimates in the time domain, and the frames 
evaluated are combined to form the synthesized signal. 
According to this method, each evaluated frame is 
modified by applying thereto a processing corresponding 
to a division by said analysis window and to a 
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multiplication by a synthesis window, and the 
synthesized signal is formed as an overlap sum of the 
modified frames. The number L being smaller than N/2 
and the samples of a frame having ranks i numbered from 
5 0 to N-l, the synthesis window f s (i) satisfies f s (N- 
L+i) + f s (i) = A for 0 < i < L, and is equal to A for 
L < i < N-L, A being a positive constant. 

In a variant of the synthesis method according to the 
10 invention, a set of successive overlapping frames of N 
samples of the audio signal which are weighted by an 
analysis window is evaluated, by transforming in the 
time domain spectral estimates corresponding 
respectively to said frames, and the evaluated frames 
15 are combined to form the synthesized signal. For a 
subset of the evaluated frames, the spectral estimates 
are obtained by processing synthesis parameters 
respectively associated with the f rames ' of said subset 
while, for the frames which do not form part of the 

2 0 subset, the spectral estimates are obtained with an 

interpolation of a part at least of the synthesis 
parameters. The successive frames of said subset 
exhibit mutual time shifts of M samples, the number M 
being larger than N/2, while the successive frames of 
25 said set exhibit mutual time shifts of M/p samples, p 
being an integer larger than 1. Each evaluated frame is 
modified by applying thereto a processing corresponding 
to a division by said analysis window and to a 
multiplication by a synthesis window, and the 

3 0 synthesized signal is formed as an overlap sum of the 

modified frames. The samples of a frame having ranks i 
numbered from 0 to N-l, the synthesis window f' s (i) 
has a support limited to the ranks i ranging from 
N/2 - M/p to N/2 + M/p and satisfies 

35 f's(i) + f'sd + M/p) = A for N/2 - M/p < i < N/2, A 
being a positive constant. 
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The invention also proposes audio processing devices 
comprising means for implementing the hereinabove 
methods of analysis and synthesis. 

Other features and advantages of the present invention 
will become apparent in the description below of non- 
limiting exemplary embodiments, with reference to the 
appended drawings, in which: 

figure 1 is a schematic diagram of an audio coder 
according to the invention; 

figures 2 and 3 are charts illustrating the 
formation of the audio signal frames in the coder 
of figure 1; 

figures 4 and 5 are graphs showing an exemplary 
spectrum of the audio signal and illustrating the 
extraction of the upper and lower envelopes of 
this spectrum; 

figure 6 is a schematic diagram of an example of 
quantization means usable in the coder of 
figure 1; 

figure 7 is a schematic diagram of means usable to 
extract parameters relating to the phase of the 
non-harmonic component in a variant of the coder 
of figure 1; 

figure 8 is a schematic diagram of an audio 
decoder corresponding to the coder of figure 1; 
figure 9 is a flowchart of an exemplary procedure 
for smoothing spectral coefficients and for 
extracting minimum phases implemented in the 
decoder of figure 8; 

figure 10 is a schematic diagram of modules for 
analysis and for spectral mixing of harmonic and 
non-harmonic components of the audio signal; 
figures 11 to 13 are graphs showing examples of 
nonlinear functions usable in the analysis module 
of figure 10; 
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figures 14 and 15 are charts illustrating a way of 
carrying out the temporal synthesis of the signal 
frames in the decoder of figure 8; 

figures 16 and 17 are graphs showing windowing 
5 functions usable in the synthesis of the frames 

according to figures 14 and 15; 

figures 18 and 19 are schematic diagrams of 
interpolation means usable in a variant embodiment 
of the coder and of the decoder; 
10 - figure 20 is a schematic diagram of interpolation 

,.*!, means usable in another variant embodiment of the 

;=f coder; and 

'2 - figures 21 and 22 are charts illustrating another 

■M way of carrying out the temporal synthesis of the 

'% 15 signal frames in the decoder of figure 8, with the 

Hi - - aid of an interpolation of parameters. 

~1 The coder and decoder described hereinbelow are digital 

circuits which can, as is customary in the field of 
■=£{.; 20 audio signal processing, be embodied by programming a 
If digital signal processor (DSP) or an application 

specific integrated circuit (ASIC) . 

The audio coder represented in figure 1 processes an 
25 audio input signal x which, in the nonlimiting example 
considered hereinbelow, is a speech signal. The signal 
x is available in digital form, for example at a 
sampling freguency F e of 8 kHz. It is, for example, 
delivered by an analog/digital converter processing the 
30 amplified output signal from a microphone. The input 
signal x can also be formed from another version, 
analog or digital, coded or uncoded, of the speech 
signal . 

35 The coder comprises a module 1 which forms successive 
frames of audio signal for the various processing 
operations performed, and an output multiplexer 6 which 
delivers an output stream <D containing, for each frame, 
sets of quantization parameters from which a decoder 
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will be capable of synthesizing a decoded version of 
the audio signal. 

The structure of the frames is illustrated by figures 2 
5 and 3. Each frame 2 is composed of a number N of 
consecutive samples of the audio signal x. The 
successive frames exhibit mutual time shifts 
corresponding to M samples, so that their overlap is 
L = N-M samples of the signal. In the example 
10 considered, where N = 256, M = 160 and L = 96, the 
duration of the frames 2 is N/F e = 32 ms, and a frame 
is formed every M/F e = 20 ms . 

In a conventional manner, the module 1 multiplies the 
15 samples of each frame 2 by a windowing function f A , 
preferably chosen for its good spectral properties. The 
samples x(i) of the frame being digitized from i = 0 to 
i = N-l, the analysis window f A (i) can thus be a 
Hamming window, expressed by: 
20 f A (i) = 0.54 + 0.46.cos^2n 1 ~ (N ~ 1} / (1) 

or a Hanning window, expressed by: 

i«u-f(i + cofr, * 

or else a Kaiser window, expressed by: 




(3) 



25 where a is a coefficient equal, for example, to 6, and 
I 0 ( . ) designates the Bessel function of index 0. 

The coder of figure 1 carries out an analysis of the 
audio signal in the spectral domain. It comprises a 
30 module 3 which calculates the fast Fourier transform 
(FFT) of each signal frame. The signal frame is shaped 
before being subjected to the FFT module 3: the module 
1 appends N = 256 zero samples thereto so as to obtain 
the maximum resolution of the Fourier transform, and it 
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moreover performs a circular permutation of the 
2N = 512 samples so as to compensate for the phase 
effects resulting from the analysis window. This 
modification of the frame is illustrated by figure 3. 
5 The frame whose fast Fourier transform is calculated on 
2N = 512 points commences with the last N/2 = 128 
weighted samples of the frame, followed by the N = 256 
zero samples, and terminates with the first N/2 = 128 
weighted samples of the frame. 

10 

The FFT module 3 obtains the spectrum of the signal for 
each frame, whose modulus and phase are respectively 
denoted |X| and cp x , or |X(i) I and cp x (i) for the 
frequency indices i = 0 to i = 2N-1 (by virtue of the 
15 symmetry of the Fourier transform and of the frames, we 
may confine ourselves to the values for 0 < i < N) . 

A fundamental-frequency detector 4 estimates for each 
signal frame a value of the fundamental frequency F 0 . 

20 The detector 4 can apply any known procedure for 
analyzing the speech signal of the frame to estimate 
the fundamental frequency F 0 , for example a procedure 
based on the autocorrelation function or the AMDF 
function, possibly preceded by a module for whitening 

25 by linear prediction. The estimate can also be made in 
the spectral domain or in the cepstral domain. Another 
possibility is to evaluate the time intervals between 
the consecutive breaks in the speech signal which are 
attributable to closures of the talker's glottis 

30 occurring over the duration of the frame. Well-known 
procedures which can be used to detect such microbreaks 
are described in the following articles: M. Basseville 
et al., "Sequential detection of abrupt changes in 
spectral characteristics of digital signals" (IEEE 

35 Trans, on Information Theory, 1983, Vol. IT-29, No. 5, 
pages 708-723) ; R. Andre-Obrecht, "A new statistical 
approach for the automatic segmentation of continuous 
speech signals" (IEEE Trans, on Acous., Speech and Sig . 
Proc, Vol. 36, No. 1, January 1988); and C. MURGIA et 
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al., "An algorithm for the estimation of glottal 
closure instants using the sequential detection of 
abrupt changes in speech signals" (Signal Processing 
VII, 1994, pages 1685-1688) . 

5 

The estimated fundamental frequency F 0 forms the 
subject of a quantization, for example scalar, by a 
module 5, which provides the output multiplexer 6 with 
an index iF of quantization of the fundamental 
10 frequency for each frame of the signal. 

The coder uses cepstral parametric modelings to 
represent an upper envelope and a lower envelope of the 
spectrum of the audio signal. The first step of the 

15 cepstral transformation consists in applying a spectral 
compression function to the modulus of the spectrum of 
the signal, which function may be a logarithmic or root 
function. The module 8 of the coder thus carries out, 
for each value X(i) of the spectrum of the signal 

20 (0 < i < N) , the following transformation: 

LX(i) = Log( I X (i) I ) (4) 
in the case of a logarithmic compression or 

LX(i) = |X(i) | Y (5) 
in the case of a root compression, y being an exponent 

25 lying between 0 and 1. 

The compressed spectrum LX of the audio signal is 
processed by a module 9 which extracts spectral 
amplitudes associated with the harmonics of the signal 
30 corresponding to the multiples of the estimated 
fundamental frequency FO . These amplitudes are then 
interpolated by a module 10 so as to obtain a 
compressed upper envelope denoted LX_sup . 

35 It should be noted that the spectral compression could 
equivalently be performed after determining the 
amplitudes associated with the harmonics. It could also 
be performed after interpolation, and this would merely 
modify the form of the interpolation functions. 
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The module 9 for extracting the maxima takes account of 
any variation in the fundamental frequency over the 
analysis frame, errors which the detector 4 may make, 
5 as well as inaccuracies related to the discrete nature 
of the frequency sampling. To do this, the search for 
the amplitudes of the spectral peaks does not consist 
simply in taking the values LX(i) corresponding to the 
indices i such that i.F e /2N is the frequency closest to 

10 a harmonic of frequency k.F 0 (k> 1). The spectral 
amplitude retained for a harmonic of order k is a local 
maximum of the modulus of the spectrum in the 
neighborhood of the frequency k.F 0 (this amplitude is 
obtained directly in compressed form when the spectral 

15 compression 8 is performed before the extraction of the 
maxima 9) . 

Figures 4 and 5 show an exemplary form of the 
compressed spectrum LX, where it may be seen that the 

2 0 maximum amplitudes of the harmonic peaks do not 
necessarily coincide with the amplitudes corresponding 
to the integer multiples of the estimated fundamental 
frequency F 0 . Since the sides of the peaks are fairly 
steep, a small error in the positioning of the 

25 fundamental frequency F 0 , amplified by the harmonic 
index k, may greatly distort the estimated upper 
envelope of the spectrum and cause poor modeling of the 
formant structure of the signal. For example, directly 
taking the spectral amplitude for the frequency 3.F 0 in 

30 the case of figures 4 and 5 would produce a sizeable 
error in the extraction of the upper envelope in the 
neighborhood of the harmonic of order k = 3, although, 
in the example drawn, this relates to a zone of 
sizeable energy. By performing the interpolation on the 

35 basis of the actual maximum, this kind of error in 
estimating the upper envelope is avoided. 



In the example represented in figure 4, the 
interpolation is performed between points whose 
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abscissa is the frequency corresponding to the maximum 
of the amplitude of a spectral peak, and whose ordinate 
is this maximum, before or after compression. 

5 The interpolation performed to calculate the upper 
envelope LX_sup is a simple linear interpolation. Of 
course, some other form of interpolation could be used 
(for example polynomial or spline) . 

10 In the preferred variant represented in figure 5, the 
interpolation is performed between points whose 
abscissa is a frequency k.F 0 which is a multiple of the 
fundamental frequency (in fact the closest frequency in 
the discrete spectrum) and whose ordinate is the 

15 maximum amplitude, before or after compression, of the 
spectrum in the neighborhood of this multiple 
frequency. 

By comparing figures 4 and 5, it may be seen that the 
20 mode of extraction according to figure 5, which 
repositions the peaks on the harmonic frequencies, 
leads to better accuracy with regard to the amplitude 
of the peaks which will be attributed by the decoder to 
the frequencies which are multiples of the fundamental 
25 frequency. A slight frequency displacement may occur in 
the position of these peaks, this not being very 
significant perceptually and anyway not being avoided 
either in the case of figure 4. In the case of figure 
4, the anchoring points for the interpolation are one 
30 and the same as the vertices of the harmonic peaks. In 
the case of figure 5, these anchoring points must lie 
precisely at the frequencies which are multiples of the 
fundamental frequency, their amplitudes corresponding 
to those of the peaks. 

35 

The search interval for the amplitude maximum 
associated with a harmonic of rank k is centered on the 
index i of the frequency of the FFT closest to k.F 0 , 
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i.e. i = 



1= 



2Nk 




where |_ a J designates the integer 



equal to or immediately less than the number a. The 
width of this search interval depends on the sampling 



5 possible range of variation of the fundamental 
frequency. This width is typically of the order of some 
ten frequencies with the exemplary values considered 
earlier. It may be rendered adjustable as a function of 
the value F 0 of the fundamental frequency and of the 
10 number k of the harmonic. 

In order to improve the resolution in the low 
frequencies and hence to more faithfully represent the 
amplitudes of the harmonics in this zone, a nonlinear 
15 distortion of the frequency scale is carried out on the 
compressed upper envelope by a module 12 before the 
module 13 performs the inverse fast Fourier transform 
(IFFT) providing the cepstral coefficients cx_sup. 

20 The nonlinear distortion allows more efficient 
minimization of the modeling error. It is, for example, 
performed on a frequency scale of Mel or Bark type. 
This distortion may possibly depend on the estimated 
fundamental frequency F 0 . Figure 1 illustrates the case 

25 of the Mel scale. The relation between the frequencies 
F of the linear spectrum, expressed in hertz, and the 
frequencies F' of the Mel scale is as follows: 



30 In order to limit the transmission bit rate, a 
truncation of the cepstral coefficients cx_sup is 
performed. The IFFT module 13 need only calculate a 
cepstral vector of NCS cepstral coefficients of orders 
0 to NCS-1. By way of example, NCS may be equal to 16. 



frequency F e , 



on the size 2N of the FFT and on the 




(6) 



35 
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Post-filtering in the cepstral domain, referred to as 
post-liftering, is applied by a module 15 to the 
compressed upper envelope LX_sup . This post-liftering 
corresponds to a manipulation of the cepstral 
5 coefficients cx_sup delivered by the IFFT module 13, 
which corresponds approximately to a post-filtering of 
the harmonic part of the signal by a transfer function 
having the conventional form: 



10 where A(z) is the transfer function of a filter for 
linear prediction of the audio signal, yi and y 2 are 
coefficients lying between 0 and 1, and u is a pre- 
emphasizing coefficient, possibly zero. The relation 
between the post-lif tered coefficient of order i, 

15 denoted c p (i), and the corresponding cepstral 
coefficient c(i) = cx_sup(i) delivered by the module 13 
is then: 



20 The optional pre-emphasizing coefficient u. may be 
controlled by setting as constraint the preserving of 
the value of the cepstral coefficient cx_sup(l) 
relating to the slope. Specifically, the value of 
c(l) = cx_sup(l) of white noise filtered by the pre- 

25 emphasizing filter corresponds to the pre-emphasizing 
coefficient. The latter may thus be chosen as follows: 
|! = (Y2-Y1) -c (1) . 

After the post-lifter 15, a normalizing module 16 again 
30 modifies the cepstral coefficients by imposing the 
constraint of exact modeling of a point of the initial 
spectrum, which is preferably the point of greatest 
energy from among the spectral maxima extracted by the 
module 9. In practice, this normalization modifies only 
35 the value of the coefficient c p (0). 




(7) 



c p (0) = c(0) 




(8) 



for i > 0 
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The normalizing module 16 operates as follows: it 
recalculates a value of the synthesized spectrum at the 
frequency of the maximum indicated by the module 9, by 
Fourier transform of the truncated and post-lif tered 
5 cepstral coefficients, taking into account the 
nonlinear distortion of the frequency axis; it 
determines a normalizing gain g N through the 
logarithmic difference between the value of the maximum 
as delivered by the module 9 and this value 
10 recalculated; and it adds the gain g N to the post- 
liftered cepstral coefficient c p (0). This normalization 
may be viewed as being part of the post-lif tering . 

The post-liftered and normalized cepstral coefficients 
15 form the subject of a quantization by a module 18 which 
transmits corresponding quantization indices icxs to 
the output multiplexer 6 of the coder. 

The module 18 can operate by vector quantization on the 

20 basis of cepstral vectors formed of post-liftered and 
normalized coefficients, here denoted cx[n] for the 
signal frame of rank n. By way of example, the cepstral 
vector cx[n] of NCS = 16 cepstral coefficients cx[n,0], 
cx[n,l], cx[n,NCS-l] is distributed as four 

25 cepstral subvectors each containing four coefficients 
of consecutive orders. The cepstral vector cx[n] can be 
processed by the means represented in figure 6, forming 
part of the quantization module 18. These means 
implement, for each component cx[n,i], a predictor of 

30 the form: 

cx p [n,i] = (l-a(i) ) .rcx[n,i] + a (i) . rex [n-1, i] (9) 
where rcx[n] designates a residual prediction vector 
for the frame of rank n whose components are 
respectively denoted rcx[n,0], rcx[n,l], 

35 rex [n,NCS-l] , and a(i) designates a prediction 
coefficient chosen so as to be representative of an 
assumed inter-frame correlation. After quantization of 
the residuals, this residual vector is defined by: 
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r cx[n, i] - g(i) .rex _ g[n - 1, i] 

rex n, l] = — (10) 

2 - a(i) 

where rcx_q[n-l] designates the quantized residual 
vector for the frame of rank n-1, whose components are 
respectively denoted rcx_q[n / 0], rcx_q[n,l], 
rcx_q[n,NCS-l] . 

The numerator of relation (10) is obtained by a 
subtractor 20, whose output vector components are 
divided by the quantities 2-a(i) at 21. For 
quantization purposes, the residual vector rcx[n] is 
subdivided into four subvectors, corresponding to the 
subdivision into four cepstral subvectors. On the basis 
of a dictionary obtained by prior learning, the unit 22 
undertakes the vector quantization of each subvector of 
the residual vector rcx[n]. This quantization can 
consist, for each subvector srcx[n], in selecting from 
the dictionary the quantized subvector srcx_q[n] which 
minimizes the quadratic error ||srcx[n] - srex _ q[n]|| 2 . The 
set icxs of quantization indices icx, corresponding to 
the addresses in the dictionary or dictionaries of the 
quantized residual subvectors srcx_q[n], is provided to 
the output multiplexer 6. 

The unit 22 also delivers the values of the quantized 
residual subvectors, which form the vector rcx_q[n] . 
The latter is delayed by one frame at 23, and its 
components are multiplied by the coefficients a(i) at 
24 so as to provide the vector to the negative input of 
the subtractor 20. The latter vector is, on the other 
hand, provided to an adder 25, the other input of which 
receives a vector formed by the components of the 
quantized residual rcx_q[n], respectively multiplied by 
the quantities l-a(i) at 26. The adder 25 thus delivers 
the quantized cepstral vector cx_q[n] which will be 
recovered by the decoder. 
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The prediction coefficient a(i) can be optimized 
separately for each of the cepstral coefficients. The 
quantization dictionaries may also be optimized 
separately for each four cepstral subvectors . Moreover, 
5 it is possible, in a manner known per se, to normalize 
the cepstral vectors before applying the 
prediction/quantization scheme, on the basis of the 
variance of the cepstra. 

10 It should be noted that the above scheme for quantizing 
the cepstral coefficients cannot be applied other than 
in respect of certain of the frames only. For example, 
provision may be made for a second mode of quantization 
as well as a process for selecting that one of the two 

15 modes which minimizes a least squares criterion with 
the cepstral coefficients to be quantized, and a bit 
indicating which of the two modes has been selected may 
be transmitted with the frame quantization indices. 

20 The quantized cepstral coefficients cx_sup_q = cx_q[n] 
provided by the adder 25 are addressed to a module 28 
which recalculates the spectral amplitudes associated 
with one or more of the harmonics of the fundamental 
frequency F 0 (figure 1) . These spectral amplitudes are, 

25 for example, calculated in compressed form, by applying 
the Fourier transform to the quantized cepstral 
coefficients while taking account of the nonlinear 
distortion of the frequency scale used in the cepstral 
transformation. The amplitudes thus recalculated are 

30 provided to an adaptation module 2 9 which compares them 
with amplitudes of maxima determined by the extraction 
module 9 . 

The adaptation module 29 controls the post-lifter 15 in 
35 such a way as to minimize a discrepancy in modulus 
between the spectrum of the audio signal and the 
corresponding modulus values calculated at 28. This 
discrepancy in modulus can be expressed by a sum of 
absolute values of differences of amplitudes, 
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compressed or otherwise, corresponding to one or more 
of the harmonic frequencies. This sum can be weighted 
as a function of the spectral amplitudes associated 
with these frequencies. 

5 

Optimally, the discrepancy in modulus taken into 
account in the adaptation of the post-lif tering would 
take account of all the harmonics of the spectrum. 
However, in order to reduce the complexity of the 

10 optimization, the module 28 can resynthesize the 
spectral amplitudes for just one or more frequencies 
which are multiples of the fundamental frequency F 0 and 
which are selected on the basis of the magnitude of the 
modulus of the spectrum in absolute value. The 

15 adaptation module 29 can, for example, consider the 
three most intense spectral peaks in the calculation of 
. the discrepancy in modulus to be minimized. 

In another embodiment, the adaptation module 2 9 
20 estimates a curve of spectral masking of the audio 
signal by means of a psycho-acoustic model, and the 
frequencies taken into account in the calculation of 
the discrepancy in modulus to be minimized are selected 
on the basis of the magnitude of the modulus of the 
25 spectrum in relation to the masking curve (it is, for 
example, possible to take the three frequencies for 
which the modulus of the spectrum most exceeds the 
masking curve) . Various conventional methods can be 
used to calculate the masking curve from the audio 
30 signal. It is, for example, possible to use that 
developed by J.D. Johnston ("Transform Coding of Audio 
Signals Using Perceptual Noise Criteria", IEEE Journal 
on Selected Area in Communications, Vol. 6, No. 2, 
February 1988) . 

35 

To carry out the adaptation of the post-lif tering, the 
module 29 can use a filter identification model. A 
simpler method consists in predefining a collection of 
sets of post-lif tering parameters, that is to say a 
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collection of pairs y lf y 2 in the case of post-lif tering 
according to relations (8) , in performing the 
operations incumbent on the modules 15, 16, 18 and 28 
for each of these sets of parameters, and in retaining 
5 that of the sets of parameters which leads to the 
minimum discrepancy in modulus between the spectrum of 
the signal and the recalculated values. The 
quantization indices provided by the module 18 are then 
those which relate to the best set of parameters. 

10 

By a process similar to that for extracting the 
coefficients cx_sup representing the compressed upper 
envelope LX_sup of the spectrum of the signal, the 
coder determines the coefficients cx_inf representing a 

15 compressed lower envelope LX_inf. A module 30 extracts 
from the compressed spectrum LX, spectral amplitudes 
associated with frequencies situated in zones of the 
spectrum which are intermediate with respect to the 
frequencies which are multiples of the estimated 

20 fundamental frequency F 0 . 

In the example illustrated by figures 4 and 5, each 
amplitude associated with a frequency situated in a 
zone intermediate between two successive harmonics k.F 0 

25 and (k+l).F 0 corresponds simply to the modulus of the 
spectrum for the frequency (k+l/2).F 0 situated in the 
middle of the interval separating the two harmonics. In 
another embodiment, this amplitude could be an average 
of the modulus of the spectrum over a small span 

30 surrounding this frequency (k+l/2).F 0 . 

A module 31 carries out an interpolation, for example 
linear, of the spectral amplitudes associated with the 
frequencies situated in the intermediate zones so as to 
35 obtain the compressed lower envelope LX__inf. 

The cepstral transformation applied to this compressed 
lower envelope LX_inf is performed according to a 
frequency scale resulting from a nonlinear distortion 
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applied by a module 32. The IFFT module 33 calculates a 
cepstral vector of NCI cepstral coefficients cx_inf of 
orders 0 to NCI-1 representing the lower envelope. NCI 
is a number which may be substantially smaller than 
5 NCS, for example NCI = 4. 

The nonlinear transformation of the frequency scale for 
the cepstral transformation of the lower envelope can 
be carried out to a scale which is finer at the high 

10 frequencies than at the low frequencies, thereby 
advantageously allowing good modeling of the unvoiced 
components of the signal at the high frequencies. 
However, to ensure homogeneity of representation 
between the upper envelope and the lower envelope, the 

15 same scale will preferably be adopted in the module 32 
as in the module 12 (Mel in the example considered) . 

The cepstral coefficients cx_inf representing the 
compressed lower envelope are quantized by a module 34, 

20 which may operate in the same manner as the module 18 
for quantizing the cepstral coefficients representing 
the compressed upper envelope. In the case considered, 
where we restricted ourselves to NCI = 4 cepstral 
coefficients for the lower envelope, the vector thus 

25 formed is subjected to a prediction residual vector 
quantization performed by means identical to those 
represented in figure 6 but without subdivision into 
subvectors. The quantization index icx = icxi 
determined by the vector quantizer 22 for each frame 

30 relating to the coefficients cx_inf is provided to the 
output multiplexer 6 of the coder. 

The coder represented in figure 1 does not comprise any 
particular device for coding the phases of the spectrum 
35 at the harmonics of the audio signal. 

On the other hand, it comprises means 36-40 for coding 
time information related to the phase of the 
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nonharmonic component represented by the lower 
envelope . 

A spectral decompression module 36 and an IFFT module 
37 form a temporal estimate of the frame of the non- 
harmonic component. The module 36 applies a 
decompression function which is the reciprocal of the 
compression function applied by the module 8 (that is 
to say an exponential or a 1/y power function) to the 
compressed lower envelope LX_inf produced by the 
interpolation module 31. This provides the modulus of 
the estimated frame of the nonharmonic component, whose 
phase is taken equal to that (p x of the spectrum of the 
signal X over the frame. The inverse Fourier transform 
performed by the module 37 provides the estimated frame 
of the nonharmonic component. 

The module 38 subdivides this estimated frame of the 
nonharmonic component into several time segments. The 
frame delivered by the module 37 being made up of 
2N = 512 weighted samples, as illustrated by figure 3, 
the module 38 considers only the first N/2 = 128 
samples and the last N/2 = 128 samples, and subdivides 
them, for example, into eight segments of 32 
consecutive samples each representing 4 ms of signal. 

For each segment, the module 38 calculates the energy 
egual to the sum of the squares of the samples, and 
forms a vector El formed of eight positive real 
components equal to the eight calculated energies. The 
largest of these eight energies, denoted EM, is also 
determined so as to be provided, with the vector El, to 
a normalizing module 39. The latter divides each 
component of the vector El by EM, so that the 
normalized vector Emix is formed of eight components 
lying between 0 and 1. It is this normalized vector 
Emix, or weighting vector, which is subjected to the 
guantization by the module 40. The latter can carry out 
a vector quantization with a dictionary determined 
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during prior learning. The quantization index iEm is 
provided by the module 4 0 to the output multiplexer 6 
of the coder. 

5 Figure 7 shows a variant embodiment of the means 
employed by the coder of figure 1 to determine the 
energy weighting vector Emix for the frame of the non- 
harmonic component. The spectral decompression and IFFT 
modules 36, 37 operate like those which bear the same 
10 references in figure 1. A selection module 42 is added 
so as to determine the value of the modulus of the 
spectrum subjected to the inverse Fourier transform 37. 
On the basis of the estimated fundamental frequency F 0 , 
the module 42 identifies harmonic regions and non- 
15 harmonic regions of the spectrum of the audio signal. 
For example, a frequency will be regarded as belonging 
to a harmonic region if it is located in a frequency 
interval centered on a harmonic k.F 0 and of width 
corresponding to a synthesized spectral line width, and 
2 0 to a nonharmonic region otherwise. In the nonharmonic 
regions, the complex signal subjected to the IFFT 37 is 
equal to the value of the spectrum, that is to say its 
modulus and its phase correspond to the values |X| and 
<px provided by the FFT module 3. In the harmonic 
25 regions, this complex signal has the same phase cp x as 
the spectrum and a modulus given by the lower envelope 
after spectral decompression 36. Proceeding thus 
according to figure 7 achieves more accurate modeling 
of the nonharmonic regions. 

30 

The decoder represented in figure 8 comprises an input 
demultiplexer 45 which extracts from the binary stream 
<t>, emanating from a coder according to figure 1, the 
quantization indices iF, icxs, icxi, iEm for the 
35 fundamental frequency F 0 , the cepstral coefficients 
representing the compressed upper envelope, the 
coefficients representing the compressed lower 
envelope, and the weighting vector Emix, and 
distributes them respectively to modules 46, 47, 48 and 
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49. These modules 46-49 comprise quantization 
dictionaries similar to those of the modules 5, 18, 34 
and 40 of figure 1, so as to restore the values of the 
quantized parameters. The modules 47 and 48 have 
5 dictionaries so as to form the quantized prediction 
residuals rcx_q[n], and they deduce therefrom the 
quantized cepstral vectors cx_q[n] with elements 
identical to the elements 23-26 of figure 6. These 
quantized cepstral vectors cx_q[n] provide the cepstral 
10 coefficients cx_sup_q and cx_inf_q processed by the 
decoder. 

A module 51 calculates the fast Fourier transform of 
the cepstral coefficients cx_sup for each signal frame. 

15 The frequency scale of the compressed spectrum 
resulting therefrom is modified nonlinearly by a module 
52 applying the nonlinear transformation reciprocal to 
that of the module 12 of figure 1, and which provides 
the estimate LX_sup of the compressed upper envelope. A 

20 spectral decompression of LX_sup, carried out by a 
module 53, provides the upper envelope X_sup comprising 
the estimated values of the modulus of the spectrum at 
the frequencies which are multiples of the fundamental 
frequency F 0 . The module 54 synthesizes the spectral 

25 estimate X v of the harmonic component of the audio 
signal, through a sum of spectral lines centered on the 
frequencies which are multiples of the fundamental 
frequency F 0 and whose amplitudes (in modulus) are 
those given by the upper envelope X_sup . 

30 

Although the digital input stream 3> does not comprise 
any specific information regarding the phase of the 
spectrum of the signal at the harmonics of the 
fundamental frequency, the decoder of figure 8 is 
35 capable of extracting information regarding this phase 
from the cepstral coefficients cx_sup_q representing 
the compressed upper envelope. This phase information 
is used to assign a phase cp(k) to each of the spectral 
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lines determined by the module 54 in the estimate of 
the harmonic component of the signal. 

As a first approximation, the speech signal may be 
5 regarded as being of minimum phase. Moreover, it is 
known that the minimum phase information may be deduced 
easily from cepstral modeling. This minimum phase 
information is therefore calculated for each harmonic 
frequency. The minimum phase assumption signifies that 
10 the energy of the synthesized signal is localized at 
the start of each period of the fundamental frequency 
Fo- 

In order to be closer to a real speech signal, slight 
15 dispersion is introduced by means of a specific post- 
liftering of the cepstra during synthesis of the phase. 
With this post-lif tering, performed by the module 55 of 
figure 8, it is possible to emphasize the formant 
resonances of the envelope and hence to control the 
20 dispersion of the phases. This post-lif tering is, for 
example, of the form (8) . 

To limit the phase breaks, it is preferable to smooth 
the post-lif tered cepstral coefficients, this being 
25 performed by the module 56. The module 57 deduces from 
the post-lif tered and smoothed cepstral coefficients 
the minimum phase assigned to each spectral line 
representing a harmonic peak of the spectrum. 

30 The operations performed by the modules 56, 57 for 
smoothing and extracting the . minimum phase are 
illustrated by the flowchart of figure 9. The module 56 
examines the variations in the cepstral coefficients so 
as to apply lesser smoothing in the presence of abrupt 

35 variations than in the presence of slow variations. To 
do this, it performs the smoothing of the cepstral 
coefficients by means of a forget factor A, c chosen as a 
function of a comparison between a threshold d th and a 
distance d between two successive sets of post-lif tered 



WO 01/03116 - 24 - PCT/FR00/01904 

cepstral coefficients. The threshold d th is itself 
adapted as a function of the variations of the cepstral 
coefficients . 

5 The first step 60 consists in calculating the distance 
d between the two successive vectors relating to frames 
n-1 and n. These vectors, here denoted cxp[n-l] and 
cxp[n], correspond for each frame to the collection of 
NCS post-liftered cepstral coefficients representing 
10 the compressed upper envelope. The distance used may in 
particular be the Euclidean distance between the two 
vectors or else a quadratic distance. 

Two smoothings are firstly performed, respectively by 
15 means of forget factors A, min and ^ max , so as to determine 
a minimum distance d min and a maximum distance d max - The 
threshold d th is then determined in step 70 as being 
situated between the minimum and maximum distances dmin/ 
d m ax: d th = P- d m ax + (1-p) • d^n, the coefficient P being, 
20 for example, equal to 0.5. 

In the example represented, the forget factors X min and 
X max are themselves selected from among two distinct 
values, respectively X min i, A, min2 and X maxlr >Wiax2 lying 

25 between 0 and 1, the indices X^i, ^maxi each being 
substantially nearer to 0 than the indices X, min2 , A, max2 . 
If d > d min (test 61), the forget factor ^ min is equal to 
A-mini (step 62); otherwise, it is taken equal to X min 2 
(step 63) . In step 64, the minimum distance d min is 

30 taken equal to ^min.d mi n + (1-Kin) -d. If d > d max (test 
65), the forget factor ^ max is equal to A. ma xi (step 66); 
otherwise, it is taken equal to X max 2 (step 67) . In 
step 68, the minimum distance d max is taken equal to 
^max-dmax + (l~^max) -d. 

35 

If the distance d between the two consecutive cepstral 
vectors is greater than the threshold d t h (test 71) , 
then a value X cl relatively close to 0 is adopted for 
the forget factor X c (step 72) . In this case, the 
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corresponding signal is regarded as being of 
nonstationary type, so that there is no need to keep a 
large memory of the earlier cepstral coefficients. If 
d < d t h/ a value A, c2 which is not as close to 0 is 
5 adopted in step 73 for the forget factor X c , so as to 
further smooth the cepstral coefficients. The smoothing 
is performed in step 74, where the vector cxl[n] of 
smoothed coefficients for the current frame n is 
determined by: 

10 cxl[n] = Xc.cxl[n-1] + ( 1-X C ) . cxp [n] (11) 

The module 57 then calculates the minimum phases cp(k) 
associated with the harmonics k.F 0 . In a known manner, 
the minimum phase for a harmonic of order k is given 
15 by: 

NCS-1 

<p(k) = -2 . J cxl[n, m] . sin (2nmk F 0 / F e ) (12) 

where cxl[n,m] designates the smoothed cepstral 
coefficient of order m for frame n. 

20 In step 75, the harmonic index k is initialized to 1. 
To initialize the calculation of the minimum phase 
assigned to harmonic k, the phase q>(k) and the cepstral 
index m are initialized to 0 and 1 respectively in 
step 76. In step 77, the module 57 adds the quantity 

25 -2 .cxl [n,m] .sin (2rank. F 0 /F e ) to the phase (p(k). The 
cepstral index m is incremented in step 7 8 and compared 
with NCS in step 79. Steps 77 and 78 are repeated so 
long as m < NCS. When m = NCS, the calculation of the 
minimum phase is terminated for harmonic k, and the 

30 index k is incremented in step 80. The calculation of 
minimum phases 76-79 is rerun for the next harmonic so 
long as k.F 0 < F e /2 (test 81) . 

In the exemplary embodiment according to figure 8, the 
35 module 54 takes account of a constant phase over the 
width of each spectral line, equal to the minimum phase 
cp(k) provided for the corresponding harmonic k by the 
module 57. 
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The estimate X v of the harmonic component is 
synthesized by summation of spectral lines positioned 
at the harmonic frequencies of the fundamental 
5 frequency F 0 . During this synthesis, it is possible to 
position the spectral lines on the frequency axis with 
a higher resolution than the resolution of the Fourier 
transform. To do this, a reference spectral line is 
precalculated once and for all according to the higher 

10 resolution. This calculation can consist of a Fourier 
transform of the analysis window F A with a transform 
size of 16 384 points, achieving a resolution of 0.5 Hz 
per point. The synthesis of each harmonic line is then 
performed by the module 54 by positioning on the 

15 frequency axis the reference line with high resolution, 
and by undersampling this reference spectral line so as 
to reduce to the resolution of 16.625 Hz of the Fourier 
transform on 512 points. This enables the spectral line 
to be positioned accurately. 

20 

For the determination of the lower envelope, the FFT 
module 85 of the decoder of figure 8 receives the NCI 
quantized cepstral coefficients cx_inf_q of orders 0 to 
NCI - 1, and it advantageously supplements them with 

25 the NCS - NCI cepstral coefficients cx__sup_q of order 
NCI to NCS - 1 representing the upper envelope. 
Specifically, it may be estimated that, as a first 
approximation, the fast variations of the compressed 
lower envelope are well reproduced by those of the 

30 compressed upper envelope. In another embodiment, the 
FFT module 85 could consider only the NCI cepstral 
parameters cx_inf_q. 

The module 8 6 converts the frequency scale in a manner 
35 reciprocal to the conversion carried out by the module 
32 of the coder, so as to restore the estimate LX_inf 
of the compressed lower envelope, subjected to the 
spectral decompression module 87. At the output of the 
module 87, the decoder is furnished with a lower 
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envelope X_inf comprising the values of the modulus of 
the spectrum in the valleys situated between the 
harmonic peaks . 

5 This envelope X_inf will modulate the spectrum of a 
noise frame whose phase is processed as a function of 
the quantized weighting vector Emix extracted by the 
module 49. A generator 88 delivers a normalized noise 
frame whose 4-ms segments are weighted in a module 89 

10 in accordance with the normalized components of the 
vector Emix provided by the module 4 9 for the current 
frame. This noise is white noise high-pass filtered so 
as to take account of the low level which in principle 
the unvoiced component has at the low frequencies. On 

15 the basis of the energy-weighted noise, the module 90 
forms frames of 2N = 512 samples by applying the 
analysis window f A , the insertion of 256 zero samples 
and the circular permutation for phase compensation in 
accordance with what was explained with reference to 

20 figure 3. The Fourier transform of the resulting frame 
is calculated by the FFT module 91. 

The spectral estimate X uv of the nonharmonic component 
is determined by the spectral synthesis module 92 which 
25 performs a frequency-by-frequency weighting. This 
weighting consists in multiplying each complex spectral 
value provided by the FFT module 91 by the value of the 
lower envelope X_inf obtained for the same frequency by 
the spectral decompression module 87. 

30 

The spectral estimates X v , X uv of the harmonic (voiced 
in the case of a speech signal) and nonharmonic (or 
unvoiced) components are combined by a mixing module 95 
controlled by a module 96 for analyzing the degree of 
35 harmonicity (or of voicing) of the signal. 

The organization of these modules 95, 96 is illustrated 
by figure 10. The analysis module 96 comprises a unit 
97 for estimating a frequency-dependent degree of 



WO 01/03116 



- 28 - 



PCT/FR00/01904 



voicing W from which are calculated four frequency- 
dependent gains, namely two gains g v , g uv controlling 
the relative magnitude of the harmonic and nonharmonic 
components in the synthesized signal, and two gains 
5 gv_cp, g U v_q> used to add noise to the phase of the 
harmonic component. 

The degree of voicing W(i) is a continuously varying 
value lying between 0 and 1 determined for each 

10 frequency index i (0 < i < N) as a function of the 
upper envelope X_sup(i) and of the lower envelope 
X_inf (i) which are obtained for this frequency i by the 
decompression modules 53, 87. The degree of voicing 
W(i) is estimated by the unit 97 for each frequency 

15 index i corresponding to a harmonic of the fundamental 

frequency F 0 , namely i = |^2Nk ^- + for k = 1, 2, 

by an increasing function of the ratio of the upper 
envelope X_sup to the lower envelope X_inf at this 
frequency, for example according to the formula: 

W(i)^min{l, 10 - 1O ^ [X - SUP(i)/X - in£(i)] j (13) 
1 Vth(F 0 ) J 

The threshold Vth(F 0 ) corresponds to the average 
dynamic swing calculated over a purely voiced synthetic 
spectrum at the fundamental frequency. It is 
25 advantageously chosen to be dependent on the 
fundamental frequency F 0 . 

The degree of voicing W(i) for a frequency other than 
the harmonic frequencies is obtained simply as being 
30 equal to that estimated for the closest harmonic. 

The gain g v (i), which depends on the frequency, is 
obtained by applying a nonlinear function to the degree 
of voicing W(i) (block 98). This nonlinear function 
35 has, for example, the form represented in figure 11: 
g v (i) = 0 if 0 < W(i) < Wl 



WO 01/03116 - 29 - PCT/FR00/01904 

(i) = W(l) ~ W1 if wl < W(i) < W2 (14) 
W2 - Wl 

g v (i) = 1 if W2 < W(i) < 1 
the thresholds Wl, W2 being such that 0 < Wl < W2 < 1 . 
The gain g uv can be calculated in a similar manner to 
the gain g v (the sum of the two gains g v , g U v being 
constant, for example equal to 1), or deduced simply 
from the latter through the relation g uv (i) = 1 - g v (i), 
as shown diagrammatically by the subtractor 99 in 
figure 10. 

It is beneficial to be able to add noise to the phase 
of the harmonic component of the signal at a given 
frequency if the analysis of the degree of voicing 
shows that the signal is actually of nonharmonic type 
at this frequency. To do this, the phase cp' v of the 
mixed harmonic component is the result of a linear 
combination of the phases <p v , cp uv of the harmonic and 
nonharmonic components X v , X uv synthesized by the 
modules 54, 92. 

The gains g v _<p, g U v_(p respectively applied to these 
phases are calculated from the degree of voicing W and 
also weighted as a function of the frequency index i, 
given that the adding of noise to the phase is actually 
useful only beyond a certain frequency. 

A first gain g v i_<p is calculated by applying a nonlinear 
function to the degree of voicing W(i), as shown 
diagrammatically by the block 100 in figure 10. This 
nonlinear function can have the form represented in 
figure 12 : 

gvi_ P (i) = Gl if 0 < W(i) < W3 
W(i) — W3 

gvi_<p(i) = Gl + (1 - Gl)— if W3 < W(i) < W4 (15) 

W4 — W3 

gvi_ 9 (i) =1 if W4 < W(i) < 1 
the thresholds W3 and W4 being such that 0 < W3 < W4 
< 1, and the minimum gain Gl lying between 0 and 1. 
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A multiplier 101 multiplies for each frequency of index 
i the gain g v i_<p by another gain g V 2_(p dependent only on 
the frequency index i, so as to form the gain g v _ ((> (i). 
The gain g V 2_cp(i) depends nonlinearly on the frequency 
5 index i, for example as indicated in figure 13: 
gv2_cp(i) =1 if 0 < i < il 
gv 2 _cp(i) = 1 - (1 - G2) ^~ ~ if il < i < 12 (16) 

gv2_<p(i) = G2 if 12 < i < 1 
the indices il and i2 being such that 0 < il < ±2 < N, 
10 and the minimum gain G2 lying between 0 and 1. The gain 
g U v_<p(i) can be calculated simply as being equal to 
1 - gv_q>(i) = 1 - g v i_<p(i) .gv2_q>(i) (subtractor 102 of 
figure 10) . 

15 The complex spectrum Y of the synthesized signal is 
produced by the mixing module 95, which carries out the 
following mixing relation, for 0 < i < N: 

Y(i) = g v (i) . |Xv(i)"l .exp[j<p; <i) ]+g w (i) .X w (i> (17) 
with (Pv (i) = gv_cp(i) -cpv(i) + g U v_cp ( i ) ■ <Puv ( i ) (18) 

20 where (p v (i) designates the argument of the complex 
number X v (i) provided by the module 54 for the 
frequency of index i (block 104 of figure 10), and 
<p uv (i) designates the argument of the complex number 
X uv (i) provided by the module 92 (block 105 of 

25 figure 10) . This combination is carried out by the 
multipliers 106-110 and the adders 111-112 represented 
in figure 10. 

The mixed spectrum Y(i) for 0 < i < 2N (with Y(2N-l-i) 
30 = Y(i)) is then transformed into the time domain by the 
IFFT module 115 (figure 8) . Only the first N/2 = 128 
and the last N/2 = 128 samples of the frame of 2N = 512 
samples produced by the module 115 are retained, and 
the circular permutation inverse to that illustrated by 
35 figure 3 is applied to obtain the synthesized frame of 
N = 256 samples weighted by the analysis window f A . 
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The frames obtained successively in this manner are 
finally processed by the temporal synthesis module 116 
which forms the decoded audio signal x . 

5 The temporal synthesis module 116 performs an overlap 
sum of frames modified with respect to those evaluated 
successively at the output of the module 115. The 
modification may be viewed in two steps illustrated by 
figures 14 and 15 respectively. 

10 

The first step (figure 14) consists in multiplying each 
frame 2' delivered by the IFFT module 115 by a window 
l/f A inverse to the analysis window f A employed by the 
module 1 of the coder. The samples of the frame 2" 
15 resulting therefrom are therefore uniformly weighted. 

The second step (figure 15) consists in multiplying the 
samples of this frame 2" by a synthesis window f s 
satisfying the following properties: 

20 f s (N-L+i) + f s (i) = A for 0 < i < L (19) 

f s (i) = A for L < i < N-L (20) 
where A designates an arbitrary positive constant, for 
example A = 1. The synthesis window f s (i) increases 
progressively from 0 to A for i going from 0 to L . It 

25 is, for example, a raised half -sinusoid: 

fs(i) = |- . (1 - cos [(i + 1 / 2)n / L] ) for 0 < i < L (21) 

After having reweighted each frame 2" by the synthesis 
window f s , the module 116 positions the successive 

30 frames with their time shifts of M = 160 samples and 
their time overlaps of L = 9 6 samples, then it sums the 
frames thus positioned over time. Owing to the 
properties (19) and (20) of the synthesis window f s , 
each sample of the decoded audio signal x thus obtained 

35 is assigned a uniform global weight, equal to A. This 
global weight originates from the contribution of a 
single frame if the sample has in this frame a rank i 
such that L < i < N - L, and comprises the summed 
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contributions of two successive frames if 0 < i < L 
where N - L < i < N. 

It is thus possible to perform the temporal synthesis 
5 in a simple manner even if, as in the case considered, 
the overlap L between two successive frames is smaller 
than half the size N of these frames. 

The two steps set forth above for modifying the signal 
10 frames may be merged into a single step. It is 
sufficient to precalculate a compound window 
f c (i) = f s (i)/f A (i) and simply to multiply the frames 2' 
of N = 256 samples delivered by the module 115 by the 
compound window f c before performing the overlap 
15 summation. 

Figure 16 shows the shape of the compound window f c in 
the case where the analysis window f A is a Hamming 
window and the synthesis window f s has the form given 
20 by relations (19) to (21) . 

Other forms of the synthesis window f s satisfying 
relations (19) and (20) may be employed. In the variant 
of figure 17, it is a piecewise affine function defined 
25 by: 

f s (i) = A.i/L for 0 < i < L (22) 

In order to improve the quality of coding of the audio 
signal, the coder of figure 1 can increase the rate of 

30 formation and of analysis of the frames, so as to 
transmit more quantization parameters to the decoder. 
In the frame structure represented in figure 2, a frame 
of N = 256 samples (32 ms) is formed every 20 ms . These 
frames of 256 samples could be formed at a higher rate, 

35 for example 10 ms, two successive frames then having a 
shift of M/2 = 80 samples and an overlap of 17 6 
samples . 
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Under these conditions, it is possible to transmit the 
complete sets of quantization parameters iF, icxs, 
icxi, iEm for just one subcollection of frames, and to 
transmit, for the other frames, parameters making it 
5 possible to perform a suitable interpolation at the 
level of the decoder. In the example envisaged 
hereinabove, the subcollection for which complete 
parameter sets are transmitted may consist of the 
frames of integer rank n, whose periodicity is 
10 M/F e = 20 ms, and the frames for which an interpolation 
is performed may be those of half-integer rank n + 1/2 
which are shifted by 10 ms with respect to the frames 
of the subcollection. 

15 In the embodiment illustrated by figure 18, the 
notation cx_q[n-l] and cx_q[n] designates quantized 
cepstral vectors determined, for two successive frames 
of integer rank, by the quantization module 18 and/or 
by the quantization module 34. These vectors comprise, 

20 for example, four consecutive cepstral coefficients 
each. They could also comprise more cepstral 
coefficients . 

A module 120 performs an interpolation of these two 
25 cepstral vectors cx_q[n-l] and cx_q[n] so as to 
estimate an intermediate value cx_i [n-1/2] . The 
interpolation performed by the module 120 can be a 
simple arithmetic average of the vectors cx_q[n-l] and 
cx_q[n]. As a variant, the module 120 could apply a 
30 more sophisticated interpolation formula, for example 
polynomial, based also on the cepstral vectors obtained 
for frames earlier than frame n-1 . Moreover, if more 
than one interpolated frame is interposed between two 
consecutive frames of integer rank, the interpolation 
35 takes account of the relative position of each 
interpolated frame. 

With the aid of the means described above, the coder 
also calculates the cepstral coefficients cx [n-1/2] 
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relating to the frame of half-integer rank. In the case 
of the upper envelope, these cepstral coefficients are 
those provided by the IFFT module 13 after post- 
liftering 15 (for example with the same post-lif tering 
5 coefficients as for the previous frame n-1) and 
normalization 16. In the case of the lower envelope, 
the cepstral coefficients cx[n-l/2] are those delivered 
by the IFFT module 33. 

10 A subtractor 121 forms the difference ecx[n-l/2] 
between the cepstral coefficients cx[n-l/2] calculated 
for the frame of half-integer rank and the coefficients 
cx_i[n-l/2] estimated by interpolation. This difference 
is provided to a quantization module 122 which 

15 addresses quantization indices icx[n-l/2] to the output 
multiplexer 6 of the coder. The module 122 operates, 
for example, by vector quantization of the 
interpolation errors ecx[n-l/2] determined successively 
for the frames of half-integer rank. 

20 

This quantization of the interpolation error can be 
performed by the coder for each of the NCS + NCI 
cepstral coefficients used by the decoder, or for just 
some of them, typically those of smallest orders. 

25 

The corresponding means of the decoder are illustrated 
by figure 19. The decoder operates essentially like 
that described with reference to figure 8 to determine 
the signal frames of integer rank. An interpolation 

30 module 124 identical to the module 120 of the coder 
estimates the intermediate coefficients cx_i[n-l/2] 
from the quantized coefficients cx_q[n-l] and cx_q[n] 
provided by the module 4 7 and/or the module 4 8 from the 
indices icxs, icxi extracted from the stream <D . A 

35 module for extracting parameters 125 receives the 
quantization index icx[n-l/2] from the input 
demultiplexer 45 of the decoder, and deduces therefrom 
the quantized interpolation error ecx_q[n-l/2] from the 
same quantization dictionary as that used by the module 
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122 of the coder. An adder 126 sums the cepstral 
vectors cx_i[n-l/2] and ecx_q[n-l/2] so as to provide 
the cepstral coefficients cx[n-l/2] which will be used 
by the decoder (modules 51-57, 95, 96, 115 and/or 
5 modules 85-87, 92, 95, 96, 115) so as to form the 
interpolated frame of rank n-1/2. 

If just some of the cepstral coefficients have formed 
the subject of an interpolation error quantization, the 
10 others are determined by the decoder by a simple 
interpolation with no correction. 

The decoder can also interpolate the other parameters 
F 0 , Emix used to synthesize the signal frames. The 

15 fundamental frequency F 0 can be linearly interpolated, 
either in the time domain, or (preferably) directly in 
the frequency domain. For the possible interpolation of 
the energy weighting vector Emix, it is' appropriate to 
perform the interpolation after denormalization and 

20 while of course taking account of the time shifts 
between frames. 

It should be noted that it is especially advantageous, 
in order to interpolate the representation of the 

25 spectral envelopes, to perform this interpolation in 
the cepstral domain. Unlike an interpolation performed 
on other parameters, such as the LSP coefficients 
(standing for "Line Spectrum Pairs") , the linear 
interpolation of the cepstral coefficients corresponds 

30 to the linear interpolation of the compressed spectral 
amplitudes . 

In the variant represented in figure 20, the coder uses 
the cepstral vectors cx_q[n], cx_q[n-l], cx_q[n-r] 
35 and cx_q [n-1/2] calculated for the last frames which 
have passed (r > 1) so as to identify an optimal 
interpolator filter which, when fed with the quantized 
cepstral vectors cx_q[n-r], cx_q[n] relating to 

the frames of integer rank, delivers an interpolated 



WO 01/03116 



- 36 - 



PCT/FR00/01904 



cepstral vector cx_i[n-l/2] which exhibits a minimum 
distance with the vector cx[n-l/2] calculated for the 
last frame of half -integer rank. 

5 In the example represented in figure 20, this 
interpolator filter 128 is present in the coder, and a 
subtractor 129 deducts its output cx_i[n-l/2] from the 
calculated cepstral vector cx[n-l/2]. A minimization 
module 130 determines the parameter set {P} of the 

10 interpolator filter 128, for which the interpolation 
error ecx[n-l/2] delivered by the subtractor 129 
exhibits a minimum norm. This parameter set {P} is 
addressed to a quantization module 131 which provides a 
corresponding quantization index iP to the output 

15 multiplexer 6 of the coder. 

As a function of the bit rate allocated in the stream O 
to the indices for quantizing the parameters {P} 
defining the optimal interpolator filter 128, it will 
20 be possible to adopt a finer or coarser quantization of 
these parameters, or a more or less elaborate form of 
the interpolator filter, or else to envisage several 
interpolator filters quantized differently for various 
vectors of cepstral coefficients. 

25 

In a simple embodiment, the interpolator filter 128 is 
linear, with r = 1: 

cx_i[n-l/2] = p.cx_q[n-l] + ( 1-p) . cx_q [n] (23) 

30 

and the parameter set {P} is limited to the coefficient 
p lying between 0 and 1. 

From the indices iP for quantizing the parameters {P} 
35 obtained in the binary stream (p, the decoder 
reconstructs the interpolator filter 128 (to within 
quantization errors) and processes the spectral vectors 
cx_q[n-r], cx_q[n] so as to estimate the cepstral 
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coefficients cx[n-l/2] used to synthesize the frames of 
half-integer rank. 

Generally, the decoder can use a simple interpolation 
5 method (without transmission of parameters by the coder 
for the frames of half-integer rank) , and an 
interpolation method with incorporation of a quantized 
interpolation error (according to figures 17 and 18) , 
or an interpolation method with an optimal interpolator 

10 filter (according to figure 19) to evaluate the frames 
of half-integer rank in addition to the frames of 
integer rank evaluated directly, as explained with 
reference to figures 8 to 13. The temporal synthesis 
module 116 can then combine the collection of these 

15 frames evaluated so as to form the synthesized signal x 
in the manner explained hereinbelow with reference to 
figures 14, 21 and 22. 

As in the method of temporal synthesis described above, 
20 the module 116 performs an overlap sum of frames 
modified with respect to those evaluated successively 
at the output of the module 115, and this modification 
can be viewed in two steps of which the first is 
identical to that described above with reference to 
25 figure 14 (divide the samples of the frame 2' by the 
analysis window f A ) . 

The second step (figure 21) consists in multiplying the 
samples of the renormalized frame 2" by a synthesis 
30 window f' s satisfying the following properties: 



f^(i) =0 for 0 < i < N/2 - M/p and N/2 + 
M/p < i < N 



(24) 



35 



fs (i) + fs (i + M/p) = A for N/2 - M/p < i < N/2 (25) 



where A designates an arbitrary positive constant, for 
example A = 1 and p is the integer such that the time 
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shift between the successive frames (calculated 
directly and interpolated) is M/p samples, i.e. p = 2 
in the example described. The synthesis window fj (i) 
increases . progressively for i going from N/2 - M/p to 
5 N/2. It is, for example, a raised sinusoid on the 
interval N/2 - M/p < i < N/2 + M/p. In particular, the 
synthesis window fg can, over this interval, be a 
Hamming window (as represented in figure 21) or a 
Hanning window. 

10 

Figure 21 shows the successive frames 2" repositioned 
over time by the module 116. The hatching indicates the 
removed portions of the frames (synthesis window at 0) . 
It may be seen that by performing the overlap sum of 
15 the samples of the successive frames, the property (25) 
ensures homogeneous weighting of the samples of the 
synthesized signal. 

As in the method of synthesis illustrated by figures 14 
20 and 15, the procedure for weighting the frames obtained 
by inverse Fourier transform of the spectra Y can be 
performed in a single step, with a compound window 
f' c (i) = fg(i)/f A (i). Figure 22 shows the form of the 
compound window f' c in the case where the windows f A and 
2 5 fj are of Hamming type. 

Like the method of temporal synthesis illustrated by 
figures 14 to 17, that illustrated by figures 14, 21 
and 22 makes it possible to take into account an 

30 overlap L between two analysis frames (for which the 
analysis is performed completely) which is smaller than 
half the size N of these frames. In general, this 
latter method is applicable when the successive 
analysis frames exhibit mutual time shifts M of more 

35 than N/2 samples (possibly even of more than N samples 
if a very low bit rate is required) , the interpolation 
leading to a collection of frames whose mutual time 
shifts are less than N/2 samples. 
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The interpolated frames can form the subject of a 
reduced transmission of coding parameters, as is 
described above, but this is not compulsory. This 
embodiment makes it possible to retain a relatively 
large interval M between two analysis frames, and hence 
to limit the transmission bit rate required, whilst 
limiting the discontinuities which are liable to appear 
by virtue of the size of this interval with respect to 
the typical timescales for the variations in the 
parameters of the audio signal, in particular the 
cepstral coefficients and the fundamental frequency. 
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CLAIMS 

1. A method of analyzing an audio signal (x) 
processed by successive frames of N samples, in 

5 which the samples of each frame are weighted by an 

analysis window (f A ) of Hamming, Hanning, Kaiser 
or similar type, a spectrum of the audio signal is 
calculated by transforming each frame of weighted 
samples in the frequency domain, and the spectrum 

10 of the audio signal is processed so as to deliver 

parameters (cx_sup, cx_inf, Emix) for synthesizing 
a signal derived from the analyzed audio signal, 
characterized in that the successive frames 
comprise an alternation of frames for which are 

15 delivered complete sets of synthesis parameters 

and of frames for which are delivered incomplete 
sets of synthesis parameters, and in that the 
successive frames for which complete sets of 
synthesis parameters are delivered exhibit mutual 

20 overlaps of less than N/2 samples. 

2. The method as claimed in claim 1, in which the 
incomplete sets of synthesis parameters include 
data (icx[n-l/2]) representing an error 

25 (ecx[n-l/2]) of interpolation of at least one of 

the synthesis parameters. 

3. The method as claimed in claim 1, in which the 
incomplete sets of synthesis parameters include 

30 data (iP) representing a filter (128) for 

interpolating at least one of the synthesis 
parameters . 

4. The method as claimed in any one of claims 1 to 3, 
35 in which the processing of the spectrum of the 

audio signal (x) comprises an extraction of coding 
parameters (cx_sup, cx_inf, Emix) with a view to 
the transmission and/or the storage of the coded 
audio signal. 
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5. The method as claimed in any one of claims 1 to 3, 
in which the processing of the spectrum of the 
audio signal (x) comprises a denoising by spectral 
subtraction . 

6. An audio processing device, comprising analysis 
means for executing a method as claimed in any one 
of claims 1 to 5 . 

7. A method of synthesizing an audio signal, in which 
successive spectral estimates (Y) corresponding 
respectively to frames of N samples of the audio 
signal which are weighted by an analysis window 
(f A ) are obtained, the successive frames 
exhibiting mutual overlaps of L samples, each 
frame of the audio signal is evaluated by 
transforming the spectral estimates in the time 
domain, and the frames evaluated are combined to 
form the synthesized signal (x), characterized in 
that each evaluated frame is modified by applying 
thereto a processing corresponding to a division 
by said analysis window (f A ) and to a 
multiplication by a synthesis window (f s ) , and the 
synthesized signal is formed as an overlap sum of 
the modified frames, and in that, the number L 
being smaller than N/2 and the samples of a frame 
having ranks i numbered from 0 to N-l, the 
synthesis window f s (i) satisfies f s (N- 
L+i) + f s (i) = A for 0 < i < L, and is equal to A 
for L < i < N-L, A being a positive constant. 

8. The method as claimed in claim 7, in which the 
synthesis window f s (i) increases from 0 to A for i 
going from 0 to L. 

9. The method as claimed in claim 8, in which the 
synthesis window f s (i) for 0 < i < L is a raised 
half -sinusoid. 
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A method of synthesizing an audio signal, in which 
a set of successive overlapping frames of N 
samples of the audio signal which are weighted by 
an analysis window (f A ) is evaluated, by 
transforming in the time domain spectral estimates 
(Y) corresponding respectively to said frames, and 
the evaluated frames are combined to form the 
synthesized signal (x), characterized in that, for 
a subset of the evaluated frames, the spectral 
estimates are obtained by processing synthesis 
parameters (cx_sup_q, cx_inf_q, Emix) respectively 
associated with the frames of said subset while, 
for the frames which do not form part of the 
subset, the spectral estimates are obtained with 
an interpolation of a part at least of the 
synthesis parameters, in that the successive 
frames of said subset exhibit mutual time shifts 
of M samples, the number M being larger than N/2, 
while the successive frames of said set exhibit 
mutual time shifts of M/p samples, p being an 
integer larger than 1, in that each evaluated 
frame is modified by applying thereto a processing 
corresponding to a division by said analysis 
window (f A ) and to a multiplication by a synthesis 
window (f's)/ and the synthesized signal is formed 
as an overlap sum of the modified frames, and in 
that, the samples of a frame having ranks i 
numbered from 0 to N-l, the synthesis window 
f's(i) has a support limited to the ranks i 
ranging from N/2 - M/p to N/2 + M/p and satisfies 
f's(i) + f's(i + M/p) = A for N/2 - M/p < i < N/2, 
A being a positive constant. 



35 



11. The method as claimed in claim 10, in which the 
synthesis window f' s (i) increases for i ranging 
from N/2 - M/p to N/2. 
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The method as claimed in claim 11, in which the 
synthesis window f' s (i) for 

N/2 - M/p < i < N/2 + M/p is a raised sinusoid. 

The method as claimed in any one of claims 10 to 
12, in which data (icx_q [n-1/2 ] ) representing an 
interpolation error (ecx_q [n-1/2 ] ) are associated 
with the frames which do not form part of said 
subset, and are used to correct at least one of 
the interpolated synthesis parameters 

(cx_i [n-1/2] ) . 

The method as claimed in any one of claims 10 to 
12, in which data (iP) representing an 
interpolator filter (128) are associated with the 
frames which do not form part of said subset, and 
are used to interpolate at least one of the 
synthesis parameters. 

The method as claimed in any one of claims 10 to 
14, in which the synthesis parameters comprise 
cepstral coefficients (cx[n]) subjected to the 
interpolation . 



25 



16. 



An audio processing device, comprising synthesis 
means for executing a method as claimed in any one 
of claims 7 to 15. 
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